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ABSTRACT 


The  aim  of. this  article  is  to  set  out  a  bookkeeping  procedure  by 
which  a  scientist  (using  the  term  flexibly)  may  compare  the  conclusions 
of  a  theory  with  facts  obtained  by  reduction  of  observational  data  with 
the  aim  of  assessing  the  hypothesis  on  which  the  theory  is  based.  It  is 
argued  that  the  appropriate  formalism  is  probability  theory,  and  that  the 
key  process  is  the  inductive  process  as  represented  by  Bayes'  theorem 
which  indicates  how  the  degree  of  belief  in  a  hypothesis  should  be  ad¬ 
justed  in  response  to  new  information. 

The  following  model  of  the  inductive  process  in  science  is  adopted. 
Between  observation  and  theory  there  is  an  "interface"  which  comprises 
a  set  of  independent  items:  each  item  comprises  a  complete  set  of  mu¬ 
tually  exclusive  statements.  It  must  be  possible  to  assign  two  prob¬ 
abilities  to  each  of  these  statements,  one  by  "reduction"  of  observational 
data,  and  the  other  by  theoretical  analysis  of  the  considered  hypothesis. 
The  "observational "  probabilities  must  be  free  from  theoretical  bias  and 
vice  versa.  Formulas  are  derived  which  show  (a)  how  the  assumed  proba¬ 
bility  of  the  hypothesis  should  be  adjusted  in  response  to  information 
concerning  one  item,  and  (b)  how  such  estimates  concerning  more  than  one 
item  may  be  combined. 

The  model  further  requires  that  one  should  consider  not  one  hypothe¬ 
sis  but  a  complete  set  of  mutually  exclusive  hypotheses.  It  is  necessary 
to  reconcile  this  requirement  with  the  normal  situation  that  a  scientist 
has  one  or  two  specific  theories  to  evaluate,  the  hypotheses  of  which 
do  not  form  a  complete  set.  A  procedure  is  proposed  to  overcome  this 
difficulty.  One  may  compare  a  real  analysis  (or  analyses)  of  a  specified 
hypothesis  (or  hypotheses)  with  a  "null"  analysis  of  the  complement  of 
this  hypothesis  (or  hypotheses).  A  "null"  analysis  is  that  which  admits 
complete  ignorance  about  the  conclusions  to  be  drawn  from  a  hypothesis: 
the  relevant  probabilities  may  therefore  be  determined  without  specific 
knowledge  of  the  hypothesis.  If  a  specified  theory  fares  worse  than  a 
"null"  theory,  it  is  a  bad  theory. 

The  method  is  illustrated  by  a  "worksheet"  indicating  the  way  in 
which  a  few  observational  facts  about  pulsars  may  be  used  to  appraise 
both  the  neutron-star  ) /pothesis  and  the  white-dwarf  hypothesis. 
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I.  INTRODUCTION 

This  article  is  concerned  with  the  role  of  induction  in  scientific 
research.  However  the  principal  aim  is  not  to  undertake  a  philosophical 
inquiry  per  se,  but  rather  to  set  up  a  bookkeeping  procedure  for  organiz¬ 
ing  the  judgments  involved  in  comparing  a  scientific  theory  with  scien¬ 
tific  data.  For  my  own  convenience,  1  shall  draw  examples  from  astro¬ 
physics,  but  I  hope  and  believe  that  the  ideas  and  methods  could  be  use¬ 
ful  in  other  fields  also. 

A  monograph  on  quasars1  by  Kahn  and  Palmer  gives  an  example  of  the 
type  of  judgment  which  scientists  must  attempt  to  make.  What  is  unusual 
about  the  example  is  that  the  authors  have  the  candor  to  present  their 
judgment  in  numerical  form.  Table  3,  on  page  111  of  that  monograph,  gives 
the  "estimated  probability  of  correctness"  of  six  hypotheses  concerning 
quasars.  A  similar  appraisal  was  made  by  Professor  L.  Woltjer  at  the 
Conference  on  Seyfert  Galaxies  held  at  Tucson,  Arizona  in  February  1968. 

Although  judgments  of  the  type  quoted  above  provide  an  effective 
means  of  communicating  degrees  of  belief,  they  immediately  raise  a  signi¬ 
ficant  question:  "How  were  these  estimates  arrived  at?"  In  the  examples 
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quoted,  there  is  no  hint  of  the  answer  to  this  question.  An  estimate 
of  this  type,  without  a  description  of  the  process  by  which  it  was  made, 
invites  controversy.  The  principal  aim  of  this  article  is  to  present  a 
procedure  for  arriving  at  estimates  of  this  type.  This  does  not  guarantee 
that  there  will  be  no  controversy,  but  it  should  help  to  localize  the 
area  of  disagreement  and  so  make  the  controversy  more  profitable. 

The  examples  quoted  have  already  established  one  relevant  point: 

The  appropriate  formalism  to  use  in  investigating  this  problem  is  prob¬ 
ability  theory.  Our  aim  then  is  to  set  up  a  model  for  the  reasoning 

process  involved  in  evaluating  a  scientific  theory,  and  to  analyze  this 
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model  by  the  theory  of  probability.  As  I.  .1.  Good  has  remarked,  Prob¬ 
ability  is  a  part  of  reasoning  and  is  therefore  more  fundamental  than 
most  theories."  It  is  in  this  sense  that  the  present  article  may  be  re¬ 
garded  as  a  "theo  y  of  theories" 

Before  procefding,  it  is  important  to  state  that  we  mall  not  be 
concerned  with  a  possible  comparison  between  "perfect  data'  (obsei-vational 
or  experimental)  and  a  "perfect  theory.'  Even  if  we  knew  what  these 


terms  meant,  their  discussion  would  be  irrelevant  to  everday  life. 

Our  aim  is  to  determine  how  one  can  make  a  judgment  about  a  theory  which 
is  admittedly  uncertain  ana  incomplete,  in  comparing  it  with  data  which 
are  uncertain  and  incomplete.  We  further  recognize  that  such  an  evalua¬ 
tion  must  be  made  not  once,  but  progressively,  as  the  data  come  in  and 
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as  the  theory  develops.  The  following  remarks  of  Jeffreys  are  relevant 
here:  "Either  we  can  learn  from  experience  or  we  cannot.  The  ability 

to  learn  from  experience  demands  the  concept  of  probability  in  relation 
to  varying  data,  and  the  recognition  of  the  meanings  of  more  probable 
than  and  less  probable  than. " 

2.  INDUCTION  AND  BAYES'  THEOREM 

Although  textbooks  frequently  represent  &  science,  such  as  physics, 
as  being  deductive,  scientists  are  well  aware  that  this  is  a  characteris 
tic  of  textbooks,  not  of  science.  P.  G.  Bergman,  speaking  at  the  Fourth 
Texas  Conference  on  Relativistic  Astrophysics  in  Dallas  in  1968,  stated 
'"bet  the  facts  lead  you  where  they  may’  is  a  gross  oversimplification  of 
how  to  proceed  in  science  and  not  necessarily  philosophically  justified. 
The  difference  between  deduction  and  induction  has  been  pointed  out  very 
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clearly  by  Polya  by  means  of  the  following  examples. 

The  basic  reasoning  process  of  the  deductive  type  is  the  syllogism: 

A  implies  B. 

B  is  false. 

Therefore  A  is  ff.lse. 

By  contrast,  the  inductive  process  follows  a  pattern  such  as  the 
following: 

A  implies  B. 

B  is  true. 

Therefore  A  is  more  credible. 

The  above  example,  although  descriptive,  does  not  lend  itself  to 
numerical  evaluation.  The  basic  procedure  for  making  quantitative  esti¬ 
mates  of  inductive  arguments  is  provi  led  by  Bayes'  theorem.  According 
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to  Jeffreys,  "This  theorem  is  to  the  theory  of  probability  what 
Pythagoras's  theorem  is  to  geometry." 


We  introduce  the  notation  (AID)  to  denote  the  probability  that 
proposition  A  is  true  on  the  basis  of  the  knowledge  that  proposition  B 
is  true.  We  adopt  the  convention  that  the  measure  of  probability  extends 
over  the  range  0  to  1 :  ( A | B )  =0  if  A  is  impossible  given  B;  and 

(A | B)  =1  if  A  is  certain  given  E. 

The  nutanon  AB  stands  lor  the  "product"  of  the  two  propositions 
A  and  B.  Then  AB  is  true  if  and  only  if  both  A  and  B  are  true.  The 
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"product  rule"  of  probability  theory  then  states  that 

(ABjC)  =  (a!bC)(B[C>  .  (2.1) 


However,  since  AB  =  BA,  this  may  alternatively  be  written  ae 


(AB| C)  =  (B| AC) (A j  C)  . 


(2.2) 


We  now  see  from  these  two  equations  that 

(A|BC)  =  <A|C)  .  (2.3) 

This  is  Bayes'  theorem. 

In  order  to  show  the  relationship  of  this  theorem  to  the  scientific 
method,  we  consider  the  following  ''model"  of  the  scientific  process.  A 
certain  hypothesis  H  is  to  be  evaluated  by  comparing  the  theoretical  con¬ 
sequences  of  this  hypothesis  with  observation  (either  of  the  world  as  we 
find  it  or  of  a  contrived  situation  called  an  "experiment").  Note  here 
an  important  point:  It  must  be  possible  to  formulate  a  statement  which 
makes  sense  either  when  it  is  regarded  as  the  consequence  of  a  theoretical 
analysis,  or  when  it  is  regarded  as  the  result  of  an  observation.  Since 
this  statement  forms  a  crucial  "link"  between  theory  ana  observation,  we 
use  for  it  the  symbol  L.  We  also  introduce  the  symbol  X  to  denote  all 
information  available  to  the  scientist  in  addition  to  (and,  we  assume, 
preceding)  his  knowledge,  derived  from  observation,  that  L  is  true.  Then 
Bayes'  theorem  may  be  rewritten  as 
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(2.4) 


(H! LX) 


(Li HX) 
ci'l  X) 


(n| x)  . 


The  term  (Mix)  is  the  'prior  probability"  of  the  hypothesis  H  (prior, 
that  is,  to  knowledge  of  L).  The  term  (Hi  LX)  is  the  "post  probability" 
of  H,  based  on  knowledge  of  both  X  and  L,  Bayes'  theorem  then  tells  us 
that  the  post  probability  equals  the  prior  probability  multiplied  by  a 
certain  factor  (L|KX)/(L|X)  which  is  sometimes  termed  the  "likelihood." 
This  is  the  ratio  of  the  probability  that  L  is  true,  given  H  and  infor¬ 
mation  X,  to  the  probability  that  is  true,  given  only  the  information 
X. 

Equation  (2,4)  gives  us  a  procedure  for  adjusting  our  degree  of  be¬ 
lief  in  a  hypothesis  on  the  basis  of  incoming  observational  evidence. 

If  H  is  irrelevant  to  L,  then  (iJHX)  =  (L|X)  so  that  ( H 1  LX )  =  (Hjx): 
the  probability  is,  appropriately  enough,  not  affected  by  knowledge  that 
L  is  true.  If  I.  seems  quite  likely  on  the  basis  of  knowledge  X,  but  very 
unlikely  on  the  basis  of  X  and  the  hypothesis  H,  then  the  probability 
that  H  is  true  is  greatly  reduced  by  the  knowledge  that  L  is  true.  If, 
on  the  other  hand,  L  seems  quite  unlikely  on  the  basis  of  information  X 
alone,  but  quite  likely  on  the  basis  of  information  X  and  the  hypothesis 
H,  then  the  fact  that  L  is  known  (observationally )  to  be  true  greatly 
increases  the  probability  that  H  is  true. 

It  is  worth  noting  a  few  further  aspects  of  this  theorem. 

1.  If  (L|X)  •=  0,  there  is  something  wrong  with  our  information 

X,  since  L  is  incompatible  with  X,  yet  L  is  observed  to  be  true. 

2.  Assuming  that  (L|x)  ^0,  (H| LX)  =  0  if  ( H  j  X )  =  0.  That  is, 
an  impossible  hypothesis  remains  impossible,  no  matter  what 
the  evidence. 

3.  Denoting  by  H  the  negative  of  H,  so  that  "H  is  true1"  means 
"H  is  not  true",  we  see  from  (2)  that  (H|LX)  =0  if  (H|X)  =  0. 
This  may  be  stated  alternatively  as  (H|LX)  =1  if  ( H | X )  =  1, 
since  by  the  sum  rule  of  probability  theory,^ 


(H|X)  +  (Hi  X)  =  1,  etc. 


(2.5) 


In  other  words,  if  H  is  certain  before  we  have  knowledge  of  I., 
it  will  be  certain  after  we  have  knowledge  of  L,  no  matter  what 
L  may  be, 

We  see  from  the  above  paragraph  that  one  should  be  very  careful  about 
assigning  probability  zero  or  unity  to  any  proposition,  since  this  entails 
that  we  can  never  change  these  estimates,  no  matter  what  subsequent  in- 
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fromation  may  turn  up.  I.  J.  Good  has  some  useful  advice  on  thiB  point: 
"Probabi lity  judgments  can  be  sharpened  by  laying  bets  at  suitable  odds. 

If  people  always  felt  obliged  to  back  their  opinions  when  challenged,  we 
would  be  spared  a  few  of  the  'certain'  predictions  that  are  so  freely 
made."  Whether  or  not  we  make  wagers,  we  should,  for  the  sake  of  future 
credibility,  be  very  cautious  about  making  "certain''  theoretical  predic¬ 
tions  or  stating  "certain”  observational  facts:  Theorists  sometimes  find 
a  calculation  to  be  wrong,  and  observers  sometimes  find  that  their  re¬ 
sults  are  not  supported  by  subsequent  observations  by  other  groups. 

3.  MODEL  OF  THE  INDUCTIVE  PROCESS  IN  SCIENCE 

Equation  (2.4)  of  the  preceding  section,  and  the  discussion  which 
followed  it,  give  a  -ough  approximation  to  the  inductive  process  in 
science.  The  simple  model  given  in  Section  2  would  be  adequate  in  a  sim 
pie  situation  in  which  theory  could  predict  the  reading  which  one  should 
obtain  from  a  particular  measuring  device  such  as  a  meter.  This  however 
is  not  the  usual  situation  in  science.  It  may  take  a  great  deal  of  Jug¬ 
gling  and  maneuvering  to  find  a  quantity  which  can  be  both  measured  and 
calculated. 

Indeed,  although  one  thinks  of  measurement  as  being  the  key  process 
in  exact  sciences,  many  of  the  comparisons  made  between  theory  and  obser¬ 
vation  are  not  normally  expressed  in  terms  of  measurable  quantities.  For 
instance,  one  may  require  of  a  physical  theory  that  it  should  be  covariant 
under  some  transformation  (Galilean,  Lorentz,  etc.).  In  the  early  studies 
of  quasars,  one  of  the  most  important  questions  was  to  determine  whether 
the  objects  are  "intragalactic"  or  "extragalactic. "  Similarly,  the  nature 
of  pulsars  for  sometime  hinged  upon  the  question  "Is  a  pulsar  a  white 
dwarf  or  a  neutron  star?" 
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Our  first  aim  in  this  section  is  to  set  up  a  "model”  for  the  induc¬ 
tive  process  in  science  which  is  a  closer  approximation  to  the  methods 
actually  used  than  the  simple  model  presented  in  Section  2.  The  essential 
requirement  is  to  be  able  to  make  a  comparison  between  theory  and  obser¬ 
vation.  Hence  a  key  requirement  is  for  an  "interface"  between  theory 
and  observation.  It  seems  that  the  basic  requirement  for  such  an  inter¬ 
face  is  that  there  should  be  a  language  which  can  be  understood  both  by 
theorists  and  by  observers.  More  specifically,  we  adopt  the  following 
definition  of  the  interface  for  the  purpose  of  constructing  a  model:  The 
interface  comprises  a  number  of  statements,  each  of  which  is  both  (a) 
a  possible  result  of  data  reduction  of  observations,  and  (b)  a  possible 
consequence  of  theoretical  arilysis  of  a  hypothesis  under  consideration. 

It  is  convenient  to  make  further  assumptions  about  these  statements. 

We  assume  that  they  may  be  arranged  into  groups:  Each  group  of  statements 
will  be  termod  an  "item."  We  assume  that  there  is  a  finite  set  of  items 

1a’  a  =  1  ’  2 »  •  '  ’  >  A' 

With  uach  item  1^  there  is  to  be  associated  a  group  of  statements 
which,  for  present  convenience,  we  assume  to  be  finite  in  number,  (This 
assumption  can  be  relaxed  without  difficulty  and  must  be  relaxed  when 
dealing  wixh  statements  about  continuous  variables.)  This  set  of  state¬ 
ments  will  be  represented  by  SOT>  n  =  1,2 .  N^.  For  convenience, 

we  assume  that  the  group  of  statements  associated  with  any  item  are  mutu¬ 
ally  exclusive  and  complete.  That  is,  for  any  item  such  as  1^,  one 

ant.  only  one  of  the  statements  S  is  tr  e. 

cm 

When  a  science  is  well  established,  one  tends  to  forget  that  the 
theory  of  that  science  is  simply  a  construct,  the  validity  of  which  is 
to  be  established  by  comparison  with  observations.  Indeed,  one  can  easily 
come  to  regard  the  abstract  construct  as  having  a  "reality"  of  its  own. 
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A  remark  by  Eddington  is  particularly  appropriate  here:  "Physical  science 
may  be  defined  as  'the  systematization  of  knowledge  obtained  by  measure¬ 
ment.'  It  is  a  convention  that  this  knowledge  shall  be  formulated  as  a 
description  of  a  world — called  the  'physical  universe'."  It  is  necessary 
for  us  to  distinguish  between  observational  data  and  theoretical  data, 
and  to  consider  explicitly  the  connection  between  them. 
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We  first  consider  what  goes  on  on  the  observational  side  of  the  in¬ 
terface.  An  astronomer  obtains  his  information  by  means  of  photographs, 
spectra,  radio  records,  etc.,  whichever  happens  to  be  the  output  of  the 
observational  instrument  he  is  using.  An  experimenter  collects  similar 
"raw  data."  However,  neither  observer  nor  experimenter  transmits  this 
raw  data  to  his  scientific  colleagues.  The  transact 4 on  between  the  ob¬ 
server  (this  term  being  used  to  include  "experimenter")  and  the  theorist 
is  typically  the  publication  of  an  article  in  which  the  observational  re¬ 
sults  are  presented  and  analyzed.  Sometimes  the  conclusions  which  the 
observer  draws  from  his  data  have  almost  the  certainty  of  mathematical 
deduction.  Usually,  however,  there  are  a  number  of  assumptions  and  pro¬ 
visos  which  are  explicitly  or  implicitly  involved  in  going  from  the  data 
to  the  conclusions.  Furthermore,  this  process  of  "data  reduction"  prob¬ 
ably  involves  the  use  of  theoretical  knowledge.  One  must  pay  careful 
attention  to  this  aspect  of  data  reduction,  if  one's  aim  is  to  compare 
the  results  of  the  observation  with  one  or  more  theories. 

The  basic  rule  of  data  reduction  is  that,  if  theoretical  knowledge 
is  to  be  used,  it  should  be  carefully  specified,  and  preferably  Bhould 
comprise  only  theoretical  knowledge  which  is  beyond  dispute.  If  the  aim 
of  an  observation  is  to  obtain  information  with  which  to  evaluate  one  or 
more  theoretical  hypothesis,  the  data  reduction  must  studiously  avoid  any 
steps  which  explicitly  or  implicitly  appeal  to  these  hypotheses. 

Data  reduction  can  range  from  a  very  simple  to  a  very  sophisticated 
operation.  A  new  technique  in  data  reduction  may  represent  an  important 
step  forward  in  a  science.  The  construction  of  the  Hertzsprung-Russell 
diagram1^’11  is  a  case  in  point.  In  looking  at  raw  data,  one  may  not  be 
able  to  see  the  wood  for  the  trees.  The  aim  of  good  data  reduction  is  to 
enable  one  to  see  the  shape  of  the  forest.  A  Hertzsprung-Russell  diagram 
might  be  prepared  from  many  precise  observations,  yet  the  significant 
information  in  the  diagram  may  comprise  the  approximate  clustering  of 
points  to  form  a  simple  and  not- to-well-defined  curve. 

In  the  present  model,  the  result  of  data  analysis  is  to  provide 
varying  degrees  of  support  for  alternative  statements  of  each  item 

I  ,  on  the  basis  of  observational  knowledge  which  we  denote  by  0.  In 
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this  model,  the  sum  total  of  observational  knowledge,  as  it  may  be  used 
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for  comparison  with  theory,  is  provided  by  the  set  of  probabilities 
(S  | ROX).  R  denotes  the  process  of  data  reduction.  This  symbol  should 
be  introduced  since  different  reduction  procedures  may  lead  to  different 
estimates  of  the  reduced  data. 

Since,  for  each  a,  the  set  of  statements  S  are  complete  and 
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mutually  exclusive,  we  see  from  the  sum  rule  of  probability  theory  that 


Nq, 

N’a 

\  (S  1 X)  =  1  , 

V  (S  1  ROX )  = 

Z  an1 

Z  071 

tH 

II 

c 

n=l 

In  the  present  context,  X  denotes  all  information  which  is  not  in  ques¬ 
tion,  including  theoretical  knowledge  used  in  the  data  reduction  and, 
conversely,  whatever  observational  knowledge  the  theorist  is  to  be  per¬ 
mitted  to  use  in  his  analyses. 

We  now  look  at.  the  other  side  of  the  fence,  and  inquire  abou+  opera¬ 
tions  on  the  theoretical  side  of  the  interface.  The  basic  procedure  has 

12 

been  described  by  Jeffreys  ,  in  his  discussion  of  'theory":  "The  use  of 
the  word  'theory'  in  several  different  senses  is  perhaps  responsible  for 
a  good  deal  of  confusion.  What  I  prefer  to  call  an  'explanation'  consists 
of  several  parts:  First,  a  statement  of  a  hypothesis;  secondly,  the  sys¬ 
tematic  development  of  its  consequences;  thirdly,  the  comparison  of  these 
consequences  with  observation." 

13 

However,  Jeffreys  elsewhere  makes  the  following  relevant  and  quali¬ 
fying  remarks:  "We  get  no  evidence  for  a  hypothesis  by  merely  working 
out  its  consequences  and  showing  that  they  agree  with  some  observations, 
because  it  may  happen  that  a  wide  range  of  other  hypotheses  would  agree 
with  those  observations  equally  well.  To  get  evidence  for  it  we  must 
also  examine  its  various  contradictories  and  show  that  they  do  not  fit 
the  observations.  This  elementary  principle  is  often  overlooked  in  al 
leged  scientific  work,  which  proceeds  by  stating  a  hypothesis,  quoting 
masses  of  results  of  observation  that  might  be  expected  on  the  hypothesis 
and  possibly  on  several  contradictory  ones,  ignoring  all  that  would  not 
be  expected  on  it,  but  might  be  expected  on  some  alternative,  and  claiming 
that  the  observations  support  the  hypothesis.  ...  Go  long  as 
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alternatives  are  not  examined  and  compared  with  the  whole  of  the  relevant 
data,  a  hypothesis  can  never  be  more  than  a  considered  one." 

We  see  from  the  above  that  the  principal  job  of  a  theorist  is  to 
determine  by  theoretical  analysis  the  consequences  of  one  or  more  hypo¬ 
theses.  As  Jeffreys  points  out,  one  obtains  the  strongest  evidence  for 
a  hypothesis  when  one  analyzes,  and  compares  with  observations,  this 
hypothesis  and  also  whatever  additional  hypotheses  are  necessary  to  form 
a  complete  set.  An  important  case  for  us  to  consider  is,  therefore, 
that  we  are  considering  a  set  of  hypotheses  ,  i  =  1,  .  .  .,  I,  which 
are  mutually  exclusive  and  form  a  complete  set.  We  assume  that  each  of 
these  hypotheses  is  subjected  in  turn  to  an  analysis  A  so  that  one 

arrives  at  probabilities  for  the  statements  S  ,  where  these  statements 

an 

are  now  regarded  as  consequences  of  each  hypothesis  considered  in  turn. 

In  this  way  we  would  arrive  at  the  probabilities  (S^jAH^X).  It  may 

well  be  that,  if  the  problem  is  sufficiently  well  defined,  each  hypothesis 

H  implies  a  definite  statement  S  of  each  item  I  ,  so  that  each  of 

l  an  a 

the  probabilities  (S^|AH1X)  would  be  either  unity  or  zero.  However, 
there  may  be  some  uncertainty  in  the  basic  information  X,  and  it  is 
normally  the  case  that  scientific  analysis  is  incomplete  and  imperfect. 
Hence  we  should  expect  that,  in  general,  the  probabilities  (S  }AH  X) 
will  have  values  between  zero  and  unity. 

Although  it  is  most  desirable  to  consider  any  hypothesis  under  con¬ 
sideration  as  a  member  of  a  complete  set,  and  to  analyze  all  members  of 
that  set,  this  will  often  be  impractical  in  normal  scientific  theoretical 
work.  It  may  be  that  the  hypotheses  can  be  identified,  but  are  too  many 
in  number  for  each  of  them  to  be  analyzed.  Another  possibility  is  that 
it  is  very  difficult,  or  practically  impossible,  to  identify  hypotheses 
which  build  up  a  given  hypothesis  into  a  complete  set.  It  is  therefore 
important  for  us  to  have  some  way  of  proceeding  which  does  not  hinge  upon 
the  explicit  identification  and  analysis  of  a  complete  set  of  hypotheses. 

Jeffreys  recognizes  that  a  well  established  hypothesis  will  be  ac¬ 
cepted  simply  on  the  basis  of  the  agreement  between  consequences  of  that 

14  , 

hypothesis  and  observation.  He  states  The  chief  advances  in  modern 
physics  .  .  .  were  achieved  by  the  method  of  Euclid  and  Newton:  to  state 
a  set  of  hypotheses,  work  out  their  consequences,  and  assert  them  if  they 
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accounted  for  most  of  the  outstanding  variation."  It  is  therefore  neces¬ 
sary  for  us  to  specify  a  procedure  which  involves  the  detailed  specifica¬ 
tion  and  analysis  of  one  or  more  hypotheses  H.  .  .  H^  ,  in  the  case 
that  this  set  of  hypotheses  does  not  form  a  complete  set.  We  assume, 
however,  that  the  hypotheses  are  chosen  to  be  mutually  exclusive. 

We  assume  that  it  is  possible  to  expand  this  set  of  hypotheses  to 
form  a  complete  set  by  adding  a  hypothesis  which  excludes  and  is 

excluded  by  any  one  of  the  hypothesis  H^,  i=l,  ....  I.  (If  the 
set  were  to  be  specified  explicitly,  it  might  be  more  convenient  to  ex¬ 
press  Hq  as  the  disjunction  of  a  large  number  of  hypotheses,  but  for 
present  purposes  this  consideration  is  irrelevant.)  Thus  the  set  of 
hypotheses  HQ,  ,  .  .  .  are  mutually  exclusive  and  form  a  complete 

set . 

We  now  ask  what  information  we  can  obtain  about  the  hypothesis 
without  specifying  the  hypothesis  or  carrying  out  an  analysis  of  the 
hypothesis.  The  answer  is,  of  course,  that  in  this  circumstance  we  must 
remain  ignorant  about  H0-  However,  this  does  not  mean  that  we  cannot 
fit  Hq  into  our  model  of  the  inductive  process  in  science.  We  have 
recognized  that  theoretical  analysis  is  in  practice  imperfect.  We  can 
introduce,  as  an  extreme  case,  a  "null"  analysis  AQ  which  gives  no 
information  whatever  about  the  consequences  of  the  hypothesis  to  which 
it  is  applied.  It  is  then  possible  to  maintain  the  formalism  which  is 
based  on  the  assumption  that  we  are  analyzing  the  consequences  of  a  com¬ 
plete  set  of  hypotheses,  by  adopting  the  following  strategem;  first,  we 
assume  that  the  known  hypotheses  H^, are  supplemented  by  a  hypo¬ 
thesis  Hq  to  form  a  complete  set;  second,  we  assume  that,  whereas  each 
hypothesis  H  , ...,H  is  subjected  to  a  proper  analysis  A,  the  hy¬ 
pothesis  Hq  (which  is  not  to  be  specified  explicitly)  is  assumed  to 
be  subjected  to  the  "null"  analysis  AQ.  We  shall,  for  simplicity  of 
notation,  suppress  the  symbols  A  and  AQ  in  the  remainder  of  this 
section. 

Now  suppose  tria-  we  chose  to  identify  (S^,  H^X)  with  (S^,X). 
Since  the  hypotheses  HQ,  , . . . ,  form  a  complete  set,  we  know  that 

+  . . .  +  Hj  is  true,  where  the  summation  sign  here  indicates  a 
"logical  sum,"  i.e.,  "and/or."  Hence 


(3.2) 


(S, 


cm' 


x) 


On  noting  that  the  hypotheses  HQ,  H  , are  mutually  exclusive,  we 
see  from  the  sum  rule  of  probability  theory”^  that 


I 


1=0 


(S. 


an 


HjX) 


(3.3) 


We  may  now  use  the  produce  rule  to  write 


(S  H  |  X)  =  (S  I H  X)  (H  X) 
cm  i 1  an1  i  i 1 


(3.4) 


so  that 


(S  _JX) 

cm1 


1 

=  2 
i=0 


(S 


cm 


HrK)  (^  | 


X) 


(3.5) 


If  we  now  make  the  choice 


(sJHox)  =  (ScJx)  ’  (3*6) 

we  see  that 

(Scm'Hox)  =  [1  *  (Holx)]  £  (Scm|HiX)(Hi|x)  •  (3,7) 

i=l 


It  follows  from  this  equation  that  if  a  particular  statement  S  ,n, 
impossible  on  the  basis  of  hypotheses  , . . . ,  ,  then  it  must  be  con¬ 
sidered  to  be  impossible  on  the  basis  of  hypothesis  also.  This  re¬ 

presents  a  defect  of  the  convention  for  (S  IH-X)  specified  by  Equation 

cm  ‘  0 

(3.6).  We  would  like  the  probabilities  (S _ Ih.X)  to  be  "maximally 

Con  '  0 

noncommittal"  about  the  statements  S  ,  subject  only  to  restrictions 

an 
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lm  osed  by  the  information  X.  The  manner  in  which  one  may  ascribe 

"mrjc ...,ially  norconunittal "  values  to  a  set  of  statements,  taking  account 

of  possible  information  about  the  statements,  has  been  discussed  by 
lb 

Jaynes.  We  will  not  pursue  this  point  here.  For  our  purpose,  it  is 
InijOrtant  O"!'  t.  note  that  it  is  necessary  to  specify  the  probabilities 
(S  |  H  in  as  noncommittal  a  manner  as  possible,  and  that  it  is  not 
desirable  to  ado^t  the  convention  (3.6). 

Altlioi'g)  we  have  recognized  two  very  different  theoretical  situations 
it  has  now  been  possible  to  set  up  a  formalism  which  covers  both  cases. 

The  formulas  which  le  shall  derive  in  the  next  two  sections  may  be  apolied 
to  either  case.  The  difference  between  the  two  cases  will  become  signi¬ 
ficant  only  when  we  consider  the  information  which  is  to  be  led  into  the 
rmula",  or  tb.-  interpretation  to  be  placed  upon  estimates  made  by  means 
il  t.ie  formniEj.  We  shall  discuss  this  further  in  the  final  section. 

4.  INDUCTION  US IMP  ONE  FACT 

K«  .io»  t'.iipr>«e  that,  by  data  reduction,  the  observer  has  assigned 
various  pro  abilities  to  the  statements  which  constitute  th'  interface. 

The  the  rist  has  assigned  probabilities  to  the  same  statements  based  on 
■flalvsis  of  each  considered  hypothesis,  and  also  possibly  based  on  a  null 
analysis  of  a  supplemental  hypothesis.  This  information,  and  knowledge 
of  the  prior  probability  of  each  hypothesis,  should  enable  us  to  assign 
a  post  probability  to  each  hypothesis.  That  is,  given  the  information 
specified  in  the  preceding  section,  and  given  the  values  of  (H^|x),  we 
should  be  able  to  calculate  (H^jOX).  Here  and  in  subsequent  sections 
it  is  convenient  to  suppress  the  symbols  R,  A  and  Aq.  These  are  im¬ 
plicitly  present:  the  observations  have  been  reduced  by  a  specified 
procedure;  hypotheses  ,  ...,  have  been  reduced  by  analysis  A  and 

hypothesis  (if  it  must  be  introduced)  has  been  reduced  by  the  null 

analysis  A^.  In  line  with  this  change  of  notation,  we  shall  refer  to 
Hq  as  the  "ignorance  hypothesis." 

In  this  section,  we  make  the  simplifying  assumption  that  there  is 
only  one  item  to  be  considered.  If  we  regard  the  observational  knowledge 
pertaining  to  an  item  as  a  "fact,  this  means  that  we  have  only  one  fact 
to  consider.  In  this  section,  therefore,  we  may  drop  the  suffix  a. 


As  an  aside,  we  may  note  that  this  can  be  regarded  as  a  formal  change 
rather  than  a  substantive  change,  since  it  is  always  possible  to  combine 
items.  Specifically,  we  could  introduce  the  notation 


Snn'n"...  “  Sln  S2n  S3n" 


(4.1) 


The  set  of  statements  S  ,  ,,  again  are  mutually  exclusive  and  form  a 

nn'n 

complete  set,  and  they  comprise  all  the  information  represented  by  the 

separate  groups  of  statements.  However,  our  intention  is  to  consider 

one  item  in  this  section,  and  to  consider  in  the  next  section  how  one 

should  combine  information  derived  from  several  separate  items. 

Our  aim  is  to  calculate  (H^JOX),  the  "post  probabilities"  of  the 

various  hypotheses  as  determined  by  the  prior  information  X  and  the 

observations  0.  Since  the  statements  S  are  mutually  exclusive  and 

n 

form  a  complete  set,  this  probability  may  be  written  as 


(HjOX) 


(4.2) 


where  the  summation  sign  indicates  a  "logical  summation."  By  the  sum 
rule  of  probability  theory,7  this  equation  may  be  expressed  as 


(Kjox)  =  ST  (Hisn[o;;) 

n 


(4.3) 


7 

and  use  of  the  product  rule  enables  us  to  put  this  equation  in  the  fol 
lowing  form: 


(H,  |  OX)  =  V  <HJS  0X>(S  |0X)  .  (4.4) 

i '  i  n  n 

n 

According  to  the  rules  of  our  model,  the  connection  between  the  hy¬ 
potheses  and  the  observations  occurs  only  via  the  statements  S^.  If  it 
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is  asserted  that  S  is  true,  all  other  knowledge  about  the  observations 
n 

is  irrelevant,  as  far  as  the  hypotheses  are  concerned.  This  property  of 
our  model  therefore  implies  that 


(H  |S  OX)  =  (H. |S  X) 
i 1  n  i 1  n 


(4.5) 


in  consequence  of  which  Equation  (4.4)  becomes 


(HjOX)  =  £ 
n 


(HjSnX)(SjOX) 


(4.6) 


At  this  stage  we  use  the  following  form  of  Bayes’  theorem  [Equation 
(2.3)]; 


(SlH  X) 

-tttst 

n1 


(4.7) 


This  introduces  the  probabilities  (SjX),  which  do  not  appear  among  our 
initial  data.  It  is  at  this  point  that  we  profit  from  the  assumption 
that  the  set  of  hypotheses  are  mutually  exclusive  and  form  a  complete 

set.  We  saw  in  Section  3  that  these  assumptions  lead  to  Equation  (3.5), 
which  is  now  written  as 


(SjX)  =  (Sn|HjX)(HjjX) 

J 


(4.8) 


leaving  the  range  of  summation  (0,1,...,  N  or  1,2,...,  N),  unspecified. 
Then  Equation  (4.7)  becomes 


(HilSnX) 


(S  | H  X) 

y  csJyMHjlx)  (HJX) 

7 


(4.9) 
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On  combining  Equations  (4.4)  and  (4.9),  we  finally  arrive  at 


(HjOX) 


V 

n 


- 

- 

(S 

I H  X) (S  1  OX) 
n1  i  n1 

y 

(Sn|HjX)(Hj|X) 

.  7 

(HjX) 


(4.10) 


This  equation  is  the  principal  result  of  this  section.  The  above  formula 
for  the  post  probabilities  (H^lOX)  involves  only  the  prior  probabilities 
(H^lx)  and  the  probabilities  of  statements  Sn  as  determined  on  the  one 
hand  by  reduction  of  the  observations  0  and  on  the  other  hand  by  analysis 
of  the  hypotheses  H  .  We  may  note  the  following  desirable  property  of 
this  equation 


(HjOX)  «  1  .  (4.11) 

l 

This  shows  that  the  formula  (4.10)  will  never  yield  a  probability  of  a 
hypothesis  greater  than  unity,  and  that  it  will  show  the  probability  of 
one  hypothesis  to  be  equal  to  unity  only  if  it  shows  the  probabilities 
of  all  other  hypotheses  to  be  zero. 

5.  INDUCTION  USING  MANY  FACTS 

In  the  preceding  section  we  obtained  a  formula  to  describe  the  prob¬ 
abilities  to  be  assigned  to  a  set  of  hypotheses  on  the  basis  of  the  prior 
probabilities  and  of  observational  and  theoretical  knowledge  concerning 
one  item  of  the  interface.  We  now  consider  how  we  may  take-  account  of 
knowledge  concerning  several  such  items.  We  assume  that  these  items 
are  "independent"  in  a  sense  to  be  specified  later. 

It  is  convenient  to  introduce  F  as  the  "fact"  associated  with 

a 

the  item  I  .  According  to  our  model,  the  fact  F^  comprises  the  set 

of  probabilities  (S  I  OX),  n  =  1,..,,  N  . 

or-  a 

We  now  suppose  that  a  group  of  hypotheses  H  have  been  evaluated 
in  terms  of  two  facts  F^  and  F^,  considered  separately  and  indepen¬ 
dently.  In  this  way  we  have  arrived  at  probabilities  which  may  be  written 
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as  (H  | F  X) ,  (H  Ik..X).  The  problem  which  we  now  consider  is  thct  of 

determining  the  probabilities  (H^F^F^).  The  sense  in  which  F  and 

F  are  considered  to  be  independent  is  the  following:  knowledge  of  F 
*•  1 

will  influence  our  interpretation  of  F only  through  the  effect  which 
F  has  on  our  evaluation  of  the  hypotheses  H  and  the  influence  of 
knowledge  of  hypotheses  H  on  our  interpretation  of  F^.  We  assume  of 
course  that  the  converse  also  Is  true. 

We  first  note  that  (H  | F^F  X)  may  be  expressed  as  follows: 


<”ilFlF2X)  = 


(l^FjFgX) 

“(F"| F  X)' 


(5.1) 


By  an  argument  parallel  to  that  leading  to  Equation  (4.4),  we  see  that 


(H1F1|F2X)  =  ^  (H.Fl|HjF2X)(Hj|F2X)  (5.2) 

J 

and 

(I'l|  F2x)  =  Y  (Fi  I  MjF2X)  (Hj  I  F2X)  .  (5.3) 

j 


We  now  note  that  the  first  term  on  the  right-hand  side  of  (5.2)  may  be 
expressed  as 


(HiF1!HjF2X)  -  (HilFlHjF2X)(FlIHJF2X) 


(5.4) 


However,  since  the  set  of  hypotheses  is  assumed  to  be  mutually  exclusive 
and  complete, 


(H, 


' F1HjK2X)  "  5iJ 


(5.5) 


Furthermore,  our  specification  of  the  sense  in  which  F^  and  F2  are 
taken  to  be  Independent  implies  that 
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(FjHjFgX)  =  (FjHjX)  . 


If  we  now  note  from  Bayes'  theorem  that 


(H  JF  JO 

(f2lHjx)  --TmJW  (f2!x)  • 


we  find  from  Equations  (5.2)  through  (5.7)  that  Equation  (5.1)  may  be 
expressed  as 


(Hi|F1F2X)  = 


(Hi|F1X)(Hi|F2X)[(H1|x)] 

S  (Hj|F1X)(Hj|F2X)[(Hj|X)]'1 


It  is  a  straightforward  matter  to  prove  (by  induction!)  that  the  general 
formula  is 


<  Hi  I  Fi  *  *  *  fax  )  = 


(Ht  |  F1X).  .  .  (Ht  |  FAX)  ( (Hi  |  X) )_  U  1  ’ 
y  (HJF  X)...(H  l,F,X)[(H  ,|X)]"(A_1) 

/L  j1  i  j'  a  y 


(5.9) 


We  note  from  this  equation  that 


\  (H  |  F,  .  „FX)  =  1  . 

/  .  i 1  1  A 


(5.10) 


6.  WORK-SHEET  CONCERNING  THE  THEORY  OF  PULSARS 


Since  the  principal  aim  of  this  article  is  a  bookkeeping  procedure 
rather  than  a  philosophical  inquiry,  it  seems  expedient  to  present  a 
simple  example  of  the  use  of  the  formulas  which  we  have  derived,  before 
indulging  in  a  philosophical  discussion  of  the  their  significance. 

The  example  to  be  discussed  concerns  the  current  astrophysical  prob¬ 
lem  of  the  nature  of  pulsars.1®  Although  it  is  now  generally  agreed  that 


17 


p  pulsar  is  to  be  identified  with  a  rotating  neutron  star,  less  than  a 
year  ago  there  was  a  lively  controversy  concerning  two  possibilities: 
the  rotating  neutron  star  model  and  the  pulsating  white  dwarf  model.  In 
January  1969  the  evidence  had  become  strongly  favorable  to  the  neutron 
star  hypothesis,  and  I  was  interested  to  try  to  assess  the  strength  of 
the  case  by  using  the  techniques  presented  in  this  article,  The  following 
material  is  therefore  to  be  regarded  as  a  personal  worksheet  (which  is 
now  several  months  out  of  date),  not  a  valid  description  of  the  present 
state  of  knowledge  concerning  pulsars. 

The  'work-sheet”  is  presented  in  Table  1.  Four  items  are  here  con¬ 
sidered, 

1.  The  range  of  period. 

2.  The  change  of  period. 

3.  Connection  with  supernovae. 

4.  Absence  of  photospheric  radiation. 

These  items  will  he  discussed  in  turn.  In  addition  to  the  hypotheses 
that  the  object  is  a  neutron  star  (H^)  and  that  the  object  is  a  white- 
dwarf  (H^ ) ,  we  consider  the  ''ignorance1'  hypothesis  (HQ),  allowing 
for  the  possibility  that  there  may  be  some  other  hitherto  unsuspected 
explanation . 

Before  discussing  the  above  items,  we  should  recognize  that  one  item 
is  conspicuous  by  its  absence:  namely,  radio  emission.  The  reason  this 
was  not  considered  is  that  very  little  persuasive  information  was  avail¬ 
able  concerning  radio  emission  from  a  neutron-star  model  or  from  a  white- 
dwarf  model.  Hence,  in  this  respect,  each  model  would  fare  no  better 
and  no  worse  than  the  ignorance  hypothesis.  Such  an  item  need  not  be 
considered  explicitly  -  it  Is  "ignorable."  Similarly,  if  there  is  no  ob¬ 
servational  information  about  an  item,  it  is  ignorable. 

There  is  another  interesting  aspect  to  the  (absent)  item  concerned 
with  radio  emission.  It  is  concerned  not  with  a  possible  property  of 
pulsars,  but  with  a  necessary  property.  In  order  for  an  object  to  be 
accepted  as  a  pulsar,  it  must  have  certain  properties.  In  the  early 
days  of  a  phenomenon,  such  properties  will  typically  be  observational. 

In  order  to  clarify  this  point,  we  introduce  for  this  purpose  only  a 
"zeroth"  Item: 
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Table  1.  PULSAR  WORK- SHEET 


Based  on  prior  probabilities  (HQ|x)  =  (HjX)  =  C!2|X)  =  1/3 
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Item  1.  Range  of  Period 


sir 

Periods  extend  over  range  -.03  sec 

1 

.5 

1 

.01 

to  3  sec 

S12: 

Sj  ^  not  true 

0 

.5 

0 

.99 

Post 

probabilities 

.331 

.662 

.007 

Item 

2: 

Change  of  Period 

S21 : 

All  pulsars  slow  down 

.97 

.33 

1 

.09 

S22 

Neither  S  nor  S ■  true 

.03 

.33 

0 

.01 

S23  : 

All  pulsars  speed  up 

0 

.34 

0 

.9 

Post 

probabilities 

.256 

.681 

.063 

Item 

3. 

Association  with  Supernovae 

V 

All  pulsars  related  to  supernovae 

.9 

.03 

.99 

0 

S32: 

Some  pulsars  related  to  supernovae 

.1 

.01 

.01 

.01 

S33: 

No  pulsars  related  to  supernovae 

0 

.96 

0 

.99 

Post 

probabilities 

.060 

.<07 

.033 

Item 

4. 

Photospherlc  Radiation 

S4i: 

No  pulsars  have  detectable  photo- 

.99 

.33 

1 

.01 

spheric  radiation 

S42: 

Some  pulsars  have  detectable  photo- 

.01 

.33 

0 

.09 

spheric  radiation 

S43: 

All  pulsars  have  detectable  photo- 

0 

.34 

0 

.9 

spheric  radiation 

Post 

probabilities 

.257 

.257 

.010 

Post 

probabilities  based  on  all  facts 

.012 

.988 

1.37  X 

-8 
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Item  0.  Ferlodic  Pulsed  Radio  Emission. 


The  statements  might  be  chosen  as  follows: 

The  object  emits  detectable  periodic  pulsed  radio  emis¬ 
sion. 

The  object  does  not  emit  detectable  periodic  pulsed  radio 
emission. 

Then  a  pulsar  is  "defined”  by  the  statement  £01-  Hence  it  becomes  a 

convention,  associated  with  the  name  "pulsar,”  that  (S^jOX)  =  1.  Note 

that  the  strict  separation  between  observational  facts  and  theoretical 

conclusions  applies  here  also:  the  convention  that  (SQ1]0X)  =  1  should 

not  prejudice  the  evaluation  of  (Sq.Jhx)  for  any  hypothesis. 

We  may  also  note  that  the  definition  of  a  phenomenon  may  at  one  time 

be  observational  in  nature  and  at  another  time  theoretical  in  nature. 

It  may  be,  for  instance,  that  when  pulsars  are  "fully"  understood,  they 

17 

will  be  defined  by  a  hypothetical  model 

We  now  return  to  a  discussion  of  Items  1-4. 

Item  1.  The  Range  of  Periods. 

We  adopt  the  following  statements  for  this  item. 

S  :  The  periods  of  pulsars  extend  over  a  range  extending  from 

.03  sec  or  below  to  3  sec  or  above. 

S^2 :  The  range  of  periods  does  not  extend  as  low  as  .03  sec 
and/or  does  not  extend  as  high  as  3  sec . 

We  know  from  observational  evidence  that  statement  S  is  correct. 

Hence  we  set  (S  j|0X)  =  1  and  (S^2|0X)  =  0. 

Here  and  in  Items  2  and  4,  we  assign  equal  probabilities  to  possible 

statements  of  an  item,  when  regarded  as  consequences  of  the  Ignorance 

theory.  Hence  we  assign  (S^^jH^X)  =  .5,  ^Sj2^H0X^  =  '5‘ 

The  neutron  star  hypothesis  is  compatible  with  any  periods  down  to 

a  minimum  value  of  about  1  m  sec.  Hence  we  set  <Sy ^ | HyX>  -  1, 

(S12|H1X)  =  0. 

If  the  pulses  were  produced  by  pulsation  of  a  white  dwarf,  one  would 
expect  the  periods  to  extend  over  a  range  from  about  one  second  to  several 
seconds.  However,  one  cannot  be  completely  confident  that  shorter  periods 


are  quite  impossible .  Therefore,  to  be  conservative,  we  say  (h^X)  =  .01, 

(S12  lH2X)  = 

We  may  now  use  Equation  (4.10)  tc  evaluate  the  post-probabilities 
of  the  three  alternative  hypotheses,  usdng  the  information  of  Item  1. 

On  denoting  by  F^  the  observational  data  relevant  to  Item  1,  we  find 
that  OI0|FlX)  =  .331,  (hJFjX)  =  .662  and  (HjF^)  =  .007. 

Item  2.  Rate  of  Change  of  Period. 

The  statements  are  as  shown  in  the  table.  Five  pulsarB  are  known 
to  be  slowing  down.  No  observable  change  has  been  detected  for  other 
pulsars,  which  we  interpret  to  mean  that  the  rate  of  change  is  too  small 
for  its  sign  to  be  determined.  If  we  assign  equal  prior  probabilities 
to  the  three  possible  statements,  the  fact  that  five  pulsars  are  known 
to  be  slowing  down  leads  to  the  observational  probabilities  shown  in  the 
table. 

If  pulsars  are  rotating  neutron  stars,  we  expect  pulsars  to  slow 

down. 

If  pulsars  are  pulsating  white  dwarfs,  we  expect  them  to  speed  up, 
since  the  white  dwarf  should  become  smaller  and  denser  as  it  ages.  How¬ 
ever,  without  specifying  what  kind  of  pulsation  we  are  considering,  we 
should  assign  a  small  probability  to  the  possiblity  that  the  pulsations 
will  speed  up.  To  be  cautious,  we  should  also  make  some  allowance  for 
the  possibility  that  some  pulsars  slow  down  and  some  pulsars  speed  up. 
These  considerations  lead  to  the  probabilities  shown. 

The  post-probabilities  based  on  data  for  this  item  are  also  shown. 

Item  3.  Relationship  of  Pulsars  to  Supernovae. 

It  Is  known  that  two  pulsars  are  located  in  the  same  positions  as 
supernova  remnants.  This  suggests  that  all  pulsars  are  related  to  super¬ 
novae  (Statement  S^).  However,  one  should  allow  for  the  possibility 
that  only  some  pulsars  are  related  to  supernovae  (Statement  S32).  We 
can  rule  out  the  possibility  that  no  pulsars  are  related  to  supernovae 
(Statement  S33). 

The  only  idea  which  has  been  advanced  concerning  the  creation  of 
a  neutron  star  is  that  it  is  formed  during  a  supernova  explosion.  The 


neutron-star  hypothesis  therefore  leads  us  to  expect  that  all  pulsars 
should  be  related  to  supernovae.  However,  to  be  cautious,  we  might  allow 
for  the  possibility  that  there  is  some  other  way  in  which  neutron  stars 
can  be  formed,  and  therefore  assign  a  small  probability,  on  the  basis  of 

Hl-  t0  S32- 

The  current  view  about  white  dwarfs  is  that  they  represent  a  state 
of  senility  reached  by  low-mass  stars  in  a  non-catastrophic  evolution. 
However,  the  idea  has  been  suggested  that  a  supernova  explosion  may  leave 
a  white  dwarf  as  end  product.  We  therefore  choose  the  probabilities  shown. 

In  assigning  probabilities  to  the  statements  of  Item  3  for  the  igno¬ 
rance  hypothesis,  the  simplest  procedure  of  assigning  equal  weight  to  the 
alternatives  seems  unacceptable,  since  a  supernova  remnant  is  an  unusual 
object.  The  key  question  is:  "What  it  the  probability  that  objects 
which  are  (or  appear  to  be)  supernova  remnants  are  associated  with  some 
unspecified  object  which  is  neither  a  neutron  star  nor  a  white  dwarf?" 

Most  astronomers  would  regard  the  probability  as  small,  but  we  are  alert 
to  the  fact  that  we  must  not  set  the  probability  as  zero.  Taking  this 
question  in  isolation  (i.e.,  neglecting  all  other  evidence  related  to 
pulsars),  I  should  regard  .1  as  too  high  and  .01  as  too  low;  accordingly, 

I  choose  <s31 |H0X>  =  *03-  I  assign  a  somewhat  lower  probability  to  the 
possibility  that,  if  pulsars  are  neither  neutron  stare  nor  white  dwarfs, 
some  of  them  happen  to  be  related  to  supernova  remnants  or  objects  which 
look  like  supernova  remnants,  setting  *S32lHoX^  =  *01 •  Hence 
(S33|h0X)  =  .96. 


Item  4.  Photospheric  Radiation. 


In  only  one  case  has  a  star-like  optical  object  been  identified  with 
a  pulsar:  this  is  the  south-preceding  star  of  the  Crab  Nebula,  identified 
with  the  Crab  pulsar.  However,  it  has  been  shown  that  the  radiation  of 
this  "star"  consists  of  pulses  similar  to  the  radio  pulses,  so  that  it 
cannot  be  interpreted  as  photospheric  radiation.  There  is  one  normal 
star  near  the  location  of  the  pulsar  CP  1919,  but  this  is  generally 
thought  not  to  be  the  optical  counterpart  of  the  pulsar.  To  avoid  being 
dogmatic,  we  choose  the  probabilities  listed  in  the  table. 
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A  neutron  star  is  too  small  for  its  photospheric  radiation  to  be 
optically  detectable.  On  the  other  hand,  a  white  dwarf  should  be  clearly 
visible  at  the  typical  distance  estimated  for  a  pulsar.  However,  the 
luminosity  of  a  white  dwarf  decreases  steadily  as  it  gets  older,  so  that 
it  is  possible  that  some  white  dwarfs  of  a  sample  would  be  invisible,  and 
we  cannot  rule  out  the  possibility  that  all  white  dwarfs  of  a  given  class 
may  be  invisible.  These  considerations  are  reflected  in  the  probabilities 
listed  in  item  4. 

We  see  from  the  Table  that  each  fact  counts  against  the  white-dwarf 
hypothesis.  However,  none  could  be  considered  to  be  conclusive.  In 
Items  1,  2  and  3,  the  neutron  star  hypothesis  fares  better  than  the  igno¬ 
rance  hypothesis,  but  not  much  better.  This  is  due  simply  to  the  fact 
that  each  item  has  been  divided  into  only  a  small  number  of  possible 
statements.  In  order  to  get  strong  evidence  for  a  hypothesis  in  compar¬ 
ison  with  the  ignorance  hypothesis,  it  is  necessary  to  consider  an  item 
divided  into  a  large  number  of  statements,  or  to  have  some  reason  for  assigning 
very  nonuniform  weighting  to  the  possible  statements  on  the  basis  of  the 
ignorance  hypothesis,  as  in  the  case  of  Item  3. 

The  post-probabilities  calculated  from  each  of  the  four  facts  may 
be  combined  by  means  of  Equation  (5.9).  When  this  is  done,  we  arrive  at 
the  result 


(Hq |0X)  =  .012 

(H^OX)  =  .988 

(H2|0X)  =  1.37  x  10"6 

We  see  that  the  combined  facts  give  very  strong  evidence  against  the 
white-dwarf  hypothesis.  If  one  were  to  consider  the  white  dwarf  hypo¬ 
thesis  and  the  neutron  star  hypothesis  as  being  the  only  possibilities, 
then  the  considerations  listed  above  would  show  conclusively  that  pulsars 
are  to  be  interpreted  as  neutron  stars.  If,  however,  we  give  equal  prior 
probability  to  the  hypothesis  that  there  is  some  other  explanation,  then 
we  find  that  the  evidence  for  the  neutron-star  hypothesis  is  good,  but 
not  overwhelming. 
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When  probabilities  hgcorae  very  close  to  unity  or  very  close  to  zero, 
it  Is  convenient  to  introduce  a  change  of  notation  which  gives  a  better 
feeling  for  the  magnitude  of  the  effect.  The  change  of  notation  fits 
most  easily  with  the  "odds"  notation  where  the  odds  on  a  proposition  is 
defined  as  P/(l-P),  that  is,  the  ratio  of  the  probability  that  the  pro¬ 
position  ia  true  to  the  probability  that  it  is  not  true.  The  odds  on  a 

proposition  can  clearly  vary  from  zero  to  infinity,  so  that  it  is  con- 

18 

venient  to  use  logarithmic  notation.  Good  recommends  that  one  use 

19 

the  decibel  notation,  McCamy  ,  in  a  recent  article,  recommends  the  use 
of  "brigg"  (decibrigg,  etc.)  as  a  general  term,  in  place  of  bel.  How¬ 
ever,  the  symbol  "db”  may  be  used  for  either  "decibel"  or  "decibrigg," 
according  to  taste. 

Using  the  symbol  H  to  denote  the  proposition  "H  is  not  true,"  we 
may  now  express  the  above  result  as  follows. 

(H0|OX)/(H0|OX)  =  .012 

(H1|OX)/(H1|OX)  =  83 
(H2|0X)/(H2|0X)  »  1.37  X  10"6 

These  results  may  be  expressed  in  db-notation  as  follows: 

Odds  on  Hq  =  -19.2  db, 

Odds  on  -  19.2  db, 

Odds  on  -  -58.6  db. 

We  see  that.  If  the  only  admitted  possibilities  were  the  neutron-star 
hypothesis  and  the  white-dwarf  hypothesis,  the  odds  on  the  neutron-star 
hypothesis  would  be  about  87  db.  However,  if  one  admits  the  ignorance 
hypothesis,  and  gives  it  equal  prior  probability  with  the  other  two  hy¬ 
potheses,  then  the  odds  on  the  neutron-star  nypothesis  is  only  15  db. 

This  demonstrates  a  view  of  scientific  theories  which  sometimes  finds 

2i 

expression:  it  is  easier  to  prove  a  theory  wrong  than  to  prove  it  right 

The  high  value  of  the  odds  on  H, ,  when  Hq  is  ignored,  is  really  due 
to  the  high  odds  against  H^. 
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The  above  example  shows  that  a  combination  of  three  or  four  quite 
cautious  statements  can  lead  to  a  strong  result.  We  note  also  that,  with 
one  exception,  the  statements  do  not  involve  numbers.  Hence  the  applica¬ 
tion  of  this  type  of  formalism  is  not  restricted  to  problems  involving 
numerical  data.  It  could  be  applied  equally  well  to  biology,  criminology 
or  social  science. 

We  may  also  point  out  two  defects  of  the  above  worksheet.  A  large 
value  (.9)  was  assigned  to  (S^jox).  This  was  due  to  the  fact  that 
both  the  Crab  pulsar  and  the  Vela  pulsar  are  associated  with  supernova 
remnants.  However,  this  represents  only  two  pulsars  out  of  about  twentj - 
six.  If  all  pulsars  should  be  equally  likely  to  show  this  association, 
then  the  evidence  is  not  too  impressive.  However,  observable  supernova 
remnants  are  only  a  few  hundred  or  a  few  thousands  years  old  so  that  this 
association  should  be  observable  only  for  young  pulsars.  If  one  adopts 
the  neutron-star  hypothesis,  then  one  would  expect  to  observe  an  associa¬ 
tion  with  supernova  remnants  only  for  short-period  pulsars.  This  1b 
what  is  found  to  be  the  case.  Hence,  in  assigning  a  large  value  to 

(S  | OX) ,  I  was  in  fact  making  use  of  the  hypothesis  H  .  This  means 

*■ 

that  I  broke  one  of  the  basic  rules  of  the  game.  The  interface  should 

therefore  be  adjusted  accordingly,  for  instance  by  taking  S  to  be  the 

J  X 

statement  "short  period  pulsars  are  related  to  young  supernova  remnants." 

At  this  date  (June  1969),  one  would  face  a  further  difficulty  in 

21 

drawing  up  the  above  worksheet.  It  has  been  observed  that  the  Vela 
pulsar  speeded  up  between  24  February  and  3  March  1969,  and  then  slowed 
down  again.  Should  one  therefore  assert  that  (S2l|OX)  =  0  and 
(S22l°X)  =  1?  Strictly  speaking,  one  should.  But  this  would  be  mis¬ 
leading,  in  the  sense  that  it  would  not  represent  the  way  science  is  done. 

At  this  stage,  the  scientist  would  begin  to  modify  what  he  regards 
as  the  "pulsar  problem.  He  would  say  that  the  "basic"  problem,  repre¬ 
senting  the  "normal  behavior"  of  pulsars,  is  such  that  all  pulsars  slow 
22 

down.  He  would  regard  the  behavior  of  the  Vela  pulsar  as  an  anomaly, 
representing  a  minor  secondary  phenomenon,  which  he  will  probably  not 
bother  about  until  he  has  arrived  at  an  adequate  understanding  of  the 
"normal  behavior."  With  this  modification  in  the  definition  of  the  prob¬ 
lem,  one  could  still  assert  in  June  1969  that  (S2„|OX)  =  .97,  (S^lox)  = 
.03,  (S23jOX)  =  0. 
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7. 


DISCUSSION 


It  is  hoped  that  the  theory  wnich  has  been  developed  in  the  preced¬ 
ing  sections  will  be  interesting-  for  two  reasons:  first,  because  it 
provides  a  way  to  combine  observational  and  theoretical  evidence  to  assess 
how  well  a  theory  explains  an  observational  phenomenon;  and,  second, 
because  one  can  learn  some  useful  lessons  from  the  exercise  of  trying  to 
ostablish  a  procedure  for  making  this  assessment. 

We  saw  in  the  previous  section  how  the  formalism  may  be  used  in  prac¬ 
tice,  However,  the  formulas  which  we  have  derived  and  the  example  which 
we  gave  are  such  that  each  item  is  divided  up  into  a  finite  number  of 
alternative  statements.  In  some  cases,  for  instance  when  considering 
continuous  measurable  quantities,  it  will  be  necessary  to  consider  a 
continuous  sequence  of  statements.  The  required  modification  of  Equation 
(4.10)  is  straightforward. 

We  now  suppose  that  the  statements  are  enumerated  by  a  continuous 
variable  v  rather  than  the  discrete  variable  n.  For  instance,  the 
statement  Sy  may  be  the  statement  that  the  measurable  quantity  T  has 
the  value  F(v).  If  we  now  denote  by  "Sy  to  Sv+dy”  the  logical  sum 
of  all  statements  enumerated  by  v  as  it  runs  from  v  to  v+dv,  we 
can  introduce  the  notation 

(S  to  S  .  |H  X)  =  (S  |H.X)  dv,  etc.  (7.1) 

v  VH-dv  i  v 1  i  v 


Using  this  notation,  a^d  replacing  the  summation  sign  in  Equatioi  ''  10) 
by  an  integration  sign,  we  obtain  the  formula 


(HjOX) 


(7.2) 


We  now  turn  to  some  of  the  implications  of  the  model.  The  first  is 
that  it  is  essential  to  set  up  an  appropriate  and  meaningful  interface 
between  observational  data  and  theoretical  calculation.  Getting  to  the 
interface  from  observations  needs  data  reduction.  An  excellent  example 
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of  data  reduction,  in  which  an  observer  goes  more  than  half  way  to  meet 
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the  theorist,  is  given  in  an  article  by  Ellison, 

We  may  note,  as  an  aside,  that  reduction  is  only  one  link  in  the 

chain,  and  the  absence  of  agreement  between  observation  and  theory  may 

on  occasion  be  traced  to  faulty  reduction.  This  point  also  was  made  by 
24 

Sherlock  Holmes;  "1  ought  to  know  by  this  time  that  when  a  fact  appears 
to  be  opposed  to  a  lcng  train  of  deductions,  it  invariably  proves  to  be 
capable  of  some  other  interpretation.” 

The  question  of  bias  is  very  interesting.  It  is  generally  recognized 
that  theorists'  conclusions  are  likely  to  be  biased  by  knowledge  of  the 
observations.  The  reason  that  great  weight  is  attached  to  predictions  is 
that  these  are  manifestly  free  from  such  bias.  It  is  equally  important 
that  facts  stated  by  an  observer  should  be  free  from  bias  due  to  knowledge 
of  theory,  but  theorists  are  not  so  concerned  about  this  possibility  that 
they  demand  observers  to  make  observations  before  a  theory  has  been  pro¬ 
posed.  There  is  therefore  a  double  standard  applied  to  theorists  and 
observers.  However,  it  would  clearly  not  be  possible  to  require  both 
prediction  by  the  theorists  and  "pre-observation"  by  the  observers. 

It  is  useful  to  introduce  the  term  "hard  fact"  for  the  case  that 
observations  lead  to  a  high  probability  for  one  statement  of  an  item  and 
very  small  probabilities  for  all  alternative  statements.  The  other  type 
of  fact  may  be  called  a  "soft  fact."  Similarly,  we  may  talk  about  a 
"firm  conclusion"  and  a  "weak  conclusion,’  We  now  note  that,  to  get  a 
good  test  of  a  theory,  we  should  be  able  to  compare  one  or  more  hard 
facts  with  one  or  more  firm  conclusions.  In  the  case  that  we  are  matching 
a  hard  fact  with  a  weak  conclusion,  or  a  soft  fact  with  a  firm  conclusion, 
we  are  no  better  off  than  if  we  were  comparing  a  soft  fact  with  a  weak 
conclusion.  In  this  case  we  could  say  that  the  strength  of  the  inference 
is  theory  limited"  or  "observation  limited,"  respectively.  The  econom¬ 
ical  use  of  observational  effort  and  theoretical  effort  requires  a  sort 
of  "impedance  matching"  at  the  interface! 

Sometimes  a  theory  will  have  one  or  more  adjustable  parameters. 

This  possibility  may  be  included  in  the  present  formalism  by  supposing 
that  we  are  dealing  with  a  continuous  sequence  of  hypotheses  H.^,  where 
is  a  continuous  variable.  With  the  notation 
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(7.3) 
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Equation  (4.10)  becomes 


(Ha|ox)x  = 


\ 


(S  |  H-  X)  (S  I  OX) 
n  A  n 


7  (dM(S  |  H  X) (H  I  X) 

n  J  n  j-i  \j.  j. 


(ha'x)x 


(7.4) 


and  Equation  (5.9)  becomes 


- (A- 1 ) 


(H  |F  ...FX).  = 

A1  1  A  A 


(Hi|r1x)v..<H)i|rAx>x((HA|x)k) 

y  d„<BjF1X,(_...(Hii|rAX)ii«HjX,Ar<4'1> 


(7.5) 


This  method  of  determining  optimum  values  of  parameters  of  the  theory 
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is  very  closely  related  to  the  "maximum  likelihood"  method  of  statistics. 

The  principal  concern  of  this  article  has  been  the  problem  of  estab¬ 
lishing  a  theory  as  a  correct  interpretation  of  a  physical  phenomenon. 

When  alternative  theories  can  be  clearly  enumerated,  there  is  the  possi¬ 
bility  of  establishing  one  of  them  as  correct  by  proving  that  the  others 
are  incorrect.  When  alternative  theories  cannot  be  clearly  enumerated 
(and  this  is  generally  the  case),  the  evidence  may  point  strongly  to  one 
of  the  specified  theories,  but  one  must  always  bear  in  mind  the  possibility 
that  further  information  will  come  along  which  will  disprove  that  theory. 

In  this  sense,  the  general  situation  is  that,  at  any  time,  there  is  no 
"correct"  theory  of  a  physical  phenomenon--there  is  only  a  "front  runner." 

Although  it  is  instructive,  and  may  sometimes  be  helpful,  to  try  to 
specify  rules  for  consistent  thinking  about  scientific  theories,  any 
scientist  is  aware  that  there  are  psychological  factors  as  well  as  ra¬ 
tional  factors  involved  in  securing  acceptance  of  a  theory.  It  is  no 
doubt  the  psychological  factors  which  led  to  the  following  highly  pes¬ 
simistic  observation,  attributed  to  Max  Plank:  "A  new  scientific  truth 
does  not  triumph  by  convincing  the  opposition  and  making  them  see  the 
light,  but  rather  by  the  opponents  dying  off  and  a  new  generation  growing 
up  to  accept  it  as  the  truth." 
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